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Abstract 

We develop efficient simulation techniques for Bayesian inference on switching GARCH 
models. Our contribution to existing literature is manifold. First, we discuss diflferent 
multi-move sampling techniques for Markov Switching (MS) state space models with 
particular attention to MS-GARCH models. Our multi-move sampling strategy is based 
on the Forward Filtering Backward Sampling (FFBS) applied to an approximation of 
MS-GARGH. Another important contribution is the use of multi-point samplers, such 
as the Multiple- Try Metropolis (MTM) and the Multiple trial Metropolize Indepen- 
dent Sampler, in combination with FFBS for the MS-G ARGH process. In this sense 
we extend to the MS state space models the work of|So| [20061 ] on efficient MTM sam- 



pler for continuous state space models. Finally, we suggest to further improve the 



sam pler efficiency by introdu cing the antithetic sampling of 



and 



Grain and Lemieux 



Graiu and Meng 



2005 1 



2007( 1 within the FFBS. Our simulation experiments on MS- 



GARGH model show that our multi-point and multi-move strategies allow the sampler 
to gain efficiency when compared with single-move Gibbs sampling. 
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1 Introduction 



The study of financial markets volatility has remained a prominent area of research in fi- 
nance given the important role it plays in a variety of financial problems (e.g. asset pric- 
ing and risk management) challenging both investors and fund managers. A remarkable 
amount of work, ranging from model specification in discrete and continuous time to esti- 
mation technique s and finally to a pplications, have been proposed in the literature. Among 
volatility models, 



BoUerslev 



1986| Generalized Autoregressive Conditional Heteroskedastic 
(GARCH) model and its variants ranks as the most popular class of models among practi- 
tioners. However, from empirical studies, this class of models have been well documented 
to exhibit high persistence of conditional variance, i.e. the pro cess is close to being non- 



stationary (nearly integrated). 



Lamoureux and Lastrapes 



1990[, among others, argue that 



the presence of structural changes in the variance process, for which the standard GARCH 
process cannot account for , may be responsible for this phenomenon. To buttress this point, 



Mikosch and Starica 



2004l | estimate a GARCH model on a sample that exhibits structural 



changes in its conditional variance an d obtained a nearly integrate d G ARCH effec t from the 



estimate. Based on this observation. 



Hamilton and Susmel 



199J and 



Cai 



1994l | propose a 



Markov Switching- Autoregressive Conditional Heteroskedastic (MS- ARCH) model, governed 
by a state vari able that follo ws a first order Markov chain to capture the high volatility per- 
sistence, while 



Gray 



19961 considers a Markov Switching GARCH (MS-GARCH) model 
since it can be written as an infinite order ARCH model and may be more parsimonious 
than the MS- ARCH model for financial data. 



The class of MS-GARCH models is gradually becoming a work house 



among economi cs 



Marcucci 



2005 



and financial practitioners for analysing financial markets data (e.g., see 
For practical implementation of this class of theoretical models, it is crucial to have reliable 
parameter estimators. Maximum Likelihood (ML) approach is a natural route to parameter 
estimation in Econometrics. However, the ML technique is not comp utationally fe asible 



for MS- GARCH models beca use o f the path dependenc e problem (see 



this end. 



Henneke et al 



201lj and 



Bauwens et al. 



3 



Gray 



19961). To 



2010| propose Bayesian approach based 



on Markov Chain Monte Carlo (MCMC) Gibbs technique for estimating the parameters of 
Markov Switching- Autoregressive Moving Average- Generalized Autoregressive Conditional 
Heteroskedastic (MS-ARMA-GARCH) and MS-GARCH models respectively. Their pro- 
posed algorithm samples each state variable given others individually (single-move Gibbs 
sampler). This sampler is slowly converging and computationally demanding. Great at- 
tention have been paid in the literature at improving such inefficiencies in the context of 
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continuous and possibly non-Gaussian and nonlinear sta t e space models. See, for exam ple 



Frtihwirth- Schnatter 



Carter and Kotm 



199J, 



Koopman and Durbin 



Jong and Shepharc 



De Joiig anc 

Qd sJsQoe 



19951 and 



for multi-points and 



1994l | for multi-move Gibbs sampler and 
multi-move Gibbs sampling schemes for continuous and nonlinear state space models. To 
the best of our knowledge there are few works on efficient multi-move sampling scheme for 



discrete or mixed state space models. See 
Gibbs for conditionally linear models 



Kim and Nelson 



Billio et al 



3 



199911 for a review on multi-move 



1999l | for glob al Metropolis Hastings al 



gorithm for sampling the hidden states of MS-ARMA models and 



Fiorentini et al. 



2013 for 



multi-move sampling in dynamic mixture models. As regards MS-GARCH models, 



Ardia 



2008 develops a Gib bs sampling scheme for the joint sampling of the state variables for 



2008 


develops 


the 


Haas et al. 


He and Maheu 



201C 



2004 model, which is a particular approximation of a MS-GARCH model. 



propose a Sequ ential Monte Carlo (SM C) algorithm for GARCH mod- 



els subject to structural breaks, while 



Bauwens et al. 



2011 



propose a Particle MCMC (PM- 



CMC) algorithm for estimating GARCH models subject to either structural breaks and 



regime switching. 



Dufavs 



2012 



, on the other hand, propose a Metropolis Hastings algo- 
rithm for block sam pling of the hidden state of infinite state MS-GARCH models. See also 
for an alternative approach, i.e. Viterbi-Based technique, for sampling 



Elliott et al 



2012 



the state variables of MS-GARCH models. 



In this paper, we develop an efScient simulation based estimation approach for MS- 
GARCH models characterized by a finite number of regimes wherein the conditional mean 
and conditional variance may change over time from one GARCH process to another. We 
follow a data augmentation framework by including the state variables into the parameter 
vector. In particular, we propose a Bayesian approach based on MCMC algorithm which 
allows to circumvent the problem of path dependence by simultaneously generating the 
states (multi-move Gibbs sampler) from their joint distribution. Our strategy for sampling 
the state variables is based on Forward Filtering Backward Sampling (FFBS) techniques. 
As for mixed hidden state models, FFBS algorithm cannot be applied directly on switching 
GARCH models, we suggest the use of a Metropolis algorithm with an FFBS proposal 
generated using an auxiliary model. We propose and discuss different auxiliary models 
obtained by alternative approximations of the MS-GARCH conditional variance equation. 

Another original contribution of the paper relates to the Metropolis step for the hid- 
den states. To efficiently estimate MS- GARCH m odels we consider the class of generalized 
(multipoint) Metropolis algorithms (see 



Metropolis-Hastings (MH) approach ([Hastings 



Liu 



2002 



Chapt 



3 



er 5) which extends the stan dard 



19701 and 



Metropolis et al. 



). See 



Liu 



3 



2002l | and iRobert and Casellal |2007l | for an introduction to MH algorithms and a review of 
various extensions. Multipoint samplers have been proved, both theoretically and compu- 
tationally, to be effective in improving the mixing rate of the MH chain and the efficiency 
of the Monte Carlo estimates based on the output of the chain. The main feature of the 
multipoint samplers is that at each iteration of the MCMC chain the new value of the chain 
is selected among multiple proposals, while in the MH algorithm one accepts o r rejects a 



single proposal. In this paper we apply the Multiple- Try Metropolis (MTM) (see 



Liu et al 



2000l | ) and some modified MT M algorithms. The superio rity of the MTM over standard MH 



algorithm has been proved in 



Grain and Lemieux 



20071 ■ which also propose to apply anti- 



the tic an d quasi-Monte Carlo techniques to obtain good proposal distributions in the MTM. 
203 

applies MTM to the estimation of latent-variable models and finds evidence of 



So 



superiority of the MTM over standard MH samplers for the latent variable estimation. The 
author also find s that the efficiency of MTM can further be increased by the use of multi- 



move sampling. 



Casarin et al 



2Q12| apply the MTM transition to the context of interacting 



chains. They provide a comparison with standard interacting MH and also estimate the gain 
of efficiency when using interacting MTM combined with block-sampling for the estimation 
of stochastic volatility models. We thus combine the MTM sampling strategics with the 
approximated F FBS techniques for the Markov switching process. In this sense, we extend 
the work of 



So 



2006l | to the more complex case of Markov-switching nonlinear state space 
models. In fact, the use of multiple proposals is particularly suited in this context where 
the forward filter is used at each iteration to generate only one proposal with a large com- 
putational cost. The use of multiple proposals based on the same run of the forward filter is 
thus discussed. We also ap ply to this context the antithetic sampling technique proposed by 



Craiu and Lemieux 



2007f to generate correlated proposal within the Multiple-try algorithm. 



and suggest a Forward Filtering Backward Antith etic Sampling (FFBAS) algorithm which 



Craiu and Mens 



20051 with FFBS and 



combines the permuted displacement algorithm of 
possibly produces pairwise negative association among the trajectories of the hidden states. 
Note that our approach could easily extended to other discrete or mixed state space models. 

The paper is organized as follows. Section 2 introduces the MS-GARCH model and 
discuss inference issues related to existing methods in the literature. In Section 3, we present 
the Bayesian inference approach and explain the multi-move multipoint sampling strategies. 
In Section 4, we study the efficiency of our estimation procedure through some simulation 
experiments. In Section 5, we conclude and discuss possible extensions. 
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2 Markov Switching GARCH models 



2.1 The model 

A Markov Switching GARCH model is a nonhnear specification of the evolution of a time 
series assessed to be affected by different states of the world and for which the conditional 
variance in each state follows a GARCH process. More specifically, let yt be the observed 
variable (e.g. the return on some financial asset) and St a discrete, unobserved, state variable 
which could be interpreted as the state of the world at time t. Define {ys, ■ ■ ■ ,yt) and 
(ss, . . . , Si) as ys:t and Sg-t respectively whenever s <t and otherwise. Then 

yt ^ fJ'tiyi-.t-i^Ofj.ist)) + at{yi:t-i,0aist))vt, '7t~A^(0, 1), (1) 

aUyi:t-i,9a{st)) = -f{st) + aist)ej_, + P{st)a^_iiyi,t-2,0a{st-i)), (2) 

where, et = at{yi:t-i,d„{st))r]t, 0„{st) = {'j{st),a(st), l3{st)), 7(si) > 0, a(st) > 0, /3(sf) > 
0, and St G {1, . . . , A/}, t = 1, . . . , T, is assumed to follow a M-state first order Markov 
chain with transition probabilities {T^ij,t}i,j=i,2,...,M- 

M 

TTij^t ^ P{st ^ i\st-l = j,yi:t-l,d-„), ^7rij,t = l V j = 1,2,...,M. 

i=l 

The parameter shift functions 7(st), a{st) and l3{st), describe the dependence of parameters 
on the realized regime st i.e. 

M M M 

list) = ^ Jmht=vi, Ct{st) = ^ a,nlst=vi, and /3(st) = ^ /3mlst=m, 
m— 1 m— 1 m— 1 

where, 

{1, if St = m 
0, otherwise 

By defining the allocation variable, st, as a A/-dimensional discrete vector, ^t = i^it, ■ ■ ■ , Cm*)', 
where ^,„t — ist=m, ni ~ 1, . . . , Af, the system of equations in ([T|)-([2]) can be written com- 
pactly as 

yt=Myl:t-l,et^^.)+<Jt{yl:t-l,eA)rlt, vt N{0,1), (3) 

aUyi:t-i,COa) = iCl) + (e^a)eLl + (e^/3)a?_i (2/i:t-2, e^-l^?.), (4) 
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where = atiyi-.t-i.Ct&crht, 1 = (717 ■ • • , 7m)', a = [ax, . . . .um)' , P = {Pi, . . . , (3m)' , 
= {Oit_,,. ■ ■ ,0m^)' and 6^ = [Oic,, ■ ■ ■ ,OMa)' with = (7m, am, /3m)' for m = 1,...,M. 
for i = 1, . . . , T. Let TTt = (ttk, . . . , ttm*), with TTit = (tTji,*, • • . , tt^m,*) for i = 1, 2, . . . , M and 
X]f=i ■"'u,* ~ 1 for all j = 1, 2, . . . , M. Since follows a M— state first order Markov chain, 
we define the transition probabilities {'nij^t\i,j=i,2,...,M by 

where is the i— th column of a M-by-M identity matrix. The conditional probability of 
given ^t_i, 6*^ and yi-.t-i is given by 

M 

P(C^I?Ll,2/l:t-l,^^) = n (^mt^-O^-S (5) 
m— 1 

which implies that the probability with which event m occm's at time t is TTmt^t-i- 



2.2 Inference Issues 



Estimating Markov switching GARCH models is a challenging problem since the likelihood 
of Ut depends on the entire sequence of past states up to time t due to the recursive structure 
of its volatility. To elaborate on this, the likelihood function of the switching GARCH model 
is given by 

M M 

mvi-.r) ^ f{yi:T\0) ^Y.---T. f(y^--T^ = e^, • • ■ , = l^) (6) 

where 9 = {{0m^l,0ma■}m=l....,M,0^,). Setting S,s.,t = (Cs,---,Ct) whenever s < t, the joint 
density function of yi;t and ^i;t on the right hand side of equation (O is 



f{yi:T, ^1:t\0) = f{yi\^l:l,e^, 9„) /(l/t 1^1:*- 1 , 6:* , 0„)p{^t\yi:t-l , ^l:t-l , 0^) 



t=2 



t=2 \i=l ) 



(7) 



with, 



!{yt\y\:t-\,i\:t,Qyi,9a) OC — ^Xp 
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Given ai , recursive substitution in equation (U) yields 



(8) 



i=0 



1=0 



Equation (|5|) clearly shows the dependence of conditional variance at time t on the entire 
history of the regimes and by inference the dependence of the likelihood function on the 
entire history of the regimes. The evaluation of the likelihood function over a sample of 
length T, as can be seen in equation ([B]), involves integration (summation) over all M'^ 
unobserved states i.e. integration over all Af^ possible (unobserved) regime paths. This 
requirement makes the maximum likelihood estimation of equation ^ infeasible in practice. 

Two major approaches have been developed in the literature in order to circumvent this 
path dependence problem. One approach involves the use of model approximation while the 
other is simulation based. 



As regards to the model approximation approach, ICail jl994l | and iHamilton and Susmel 



1994l | approximated the MS-GARCH model by an MS-ARCH model. This approach effec- 
tively makes the model tractable because the lagged conditional varian ce that makes the con- 
dition al variance dependent on the history of r egime has be en dropped. 



2OO2I I employed the algorithm developed by 



Chib 



3 



Kaufman and Friihwirth-Schnatter 



19961 for a Markov mixture models to 



compute the marginal likelihood of the MS- ARCH model but noted that this methodology 
cannot be carried over to the MS-GARCH model be cause of the path dependence problem. 
Another approximation approach can be credited to 



Gray 



1996 who noted that the condi- 



tional density of the return is essentially a mixture of distributions with time- varying mixing 
parameter and in particular under normality assumption he suggested the use of aggregate 
conditional variances over all regimes as the lagged con ditional varia nce when constructing 
the conditiona l variance at eac h tim e step. Extension s of 



Dueker 



1997 1 



Klaassen 



2002t and 



Haas et al 



Gray 



3 



19961 model can be found in 



2004| among others. 



Abramson and Cohen 



2007| provide stationarity conditions for some of these approximations. The problem with 



this approach is that these approximations cannot be verified. 



Among the simulatio n based approaches pr oposed in the literature there is the Bayesian 



estimation technique by 



Bauwens et al 



2010|. In particular, they develop a single-move 



MCMC Gibbs sampler for a Markov switching GARCH model with a fixed number of 
regimes. The authors also provide sufficient conditions for geometric crgodicity and ex- 
istence of moments of the process. Their estimation approach, though quite promising, has 
one main limitation that has rendered it unattractive. The single-move Gibbs sampler is 
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inefficient i.e. draws from tlie single-move sclieme are noted to be Iriglily correlated and thus 
slow down the convergence of the Markov chain. An alternative simulation based approach 
is the particle filter approach proposed by 



He and Maheu 



3 



2010|. They develop a sequen- 



tial Monte Carlo method for estimating GARCH models subject to an unknown number of 
structural breaks. 



In the next section, we propose an efficient Bayesian estimation procedure for estimating 
the parameters of MS-GARCH models by simultaneously generating the whole state vector. 



3 Bayesian Inference 

Based on the aforementioned inference issues associated with MS-GARCH models, we present 
a Bayesian approach based on MCMC Gibbs algorithm which allows us to circumvent the 
path dependence problem and efficiently sample the state trajectory. The purpose of this 
algorithm is to generate samples from the posterior distribution which are then used for its 
characterization. We follow a data augmentation framework by treating the state variables 
as parameters of the model and construct the likelihood function assuming the states known. 

Before proceeding with the elicitation of our proposed Bayesian technique, it is important 
that we make explicit the parametric specification of the conditional mean, /it(?/i:t_i, ^^0^), 
of the return process yt in equation ([3|) and the transition probabilities p{S,[\SJi_i, yi-.t-i, Ott)- 
Since our major aim is to define a technique for sampling the state variables efficiently, which 
in turn will affect other parameter estimates, we assume for expository purposes a conditional 
mean defined by a constant switching parameter given by ^^/i where fi = {^i,...,^m)' 
and constant transition probabilities. Alternative specification such as switching ARMA 
process could be thought of fo r the conditi onal mean and time varying transition probabilities 
may be defined by following iGrav 



1996l | approach, i.e. specifying transition probabilities 
as a function of past observables. Under this specification, the augmented parameter set 
of our model consists of ^i:Tj ~ (^/^, ^o-, ^tt) where 9^ ~ ^, 6-^ = {{T^m}m=i m) and 

= ({^mcr}m=l,...,J\/) with Oma = (7m, Q!™, 7r„i = (7r,„i, ■ • ■ , T^mM) and X]m=l ""mm* = 

1 V m* — 1,...,A/. The prior distributions of the parameter vector are assumed to be 
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independent and chosen as follows 

M 

^''""11 Diriclilet(i^i„, . . . , VMm) 

m—1 
M 

m—1 
M 



where vim, • ■ • , J^Mm, a™^, 6™^, amjjbmj, ama,bma, amp,b,np V m = 1, . . . , M are hyperpa- 
ramcters to be defined. The supports of the prior di stribution of O^i and Og will be chosen to 



avoid label switching (identifiability restriction). See 



Friihwirth-Schnattei 



20061 for an intro- 



ductio n to label switching problem for dynamic mixtures and MS models an dBauwens et al 
20ic| for illustration of the identification constraint for MS-GARCH models. The choice of 



the prior supports also helps in preventing regime degeneration. The joint prior distribution 
is thus proportional to 

M 

f{6) oc J]^ Dirichlet(t/i„, . . . , VMm)- (9) 

m—1 

The posterior density of the augmented parameter vector given by 

f{0,(.l:T\yi:T) OC fiyi:TAl:T,0) 

(10) 

= fiyi:T\^l:T,0)fi^,..T\9)f{9). 

cannot be identified with any standard distribution, hence we cannot sample directly from 
it. Using Gibbs sampler, we can generate samples from this high-dimensional posterior 
density. This will be done by itcratively sampling from the following three full conditional 
distributions 

• P{^l:T\0,yi:T), 

• f{e^\e^,0g,^l..T,yi:T) = f{On\^l:T), and 

• fiOa,O^\0^,^,.,T,yi:T) = f{Oa,0^\^l:T,yi:T). 



These full conditional distributions are easier to manage and sample from because they can 
either be associated with a known distribution or simulated by a lower dimensional auxiliary 
sampler. In the following subsections we present in details our sampling procedure. 
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3.1 Sampling the state variables ^i-^t- 

To sample ^i;t using the single move algorithm, one relies on computing 

M T 
pte|ei:t-l,et+l:T,e,yi:T) « H i^m^t)^""''^' l[ f {vA^j , 9 , y,.., (11) 

m— 1 J— i 

for each value in {cm : fn = 1, ■ • • , M} and dividing each evaluation by the sum of the 
M points to get the normalized discrete distribution of from which to sample. Sampling 
from such a distribution once the probabilities are known is similar to sampling from a 
Multinomial distribution. On the other hand, the full joint conditional distribution of the 
state variables, £,i:t, given the parameter values and return series 

P{^1:t\0, yi:T) « /(yi:T|a:T, 0)p{^1:t\O) (12) 

is a non-standard distribution. Therefore multi-move sampling is not feasible. For this 
reason, we consider a generalization of MH (i.e. multipoint Metropolis-Hastings) strategy 
for generating proposals for the state variables. Multipoint samplers are designed to consider 
multiple proposals at each iteration of an MH and to choose the new value of the chain from 
this trial set. The multi-move and multipoint sampling procedures arc of interest because 
of their potentials at addressing issues associated with multi-modality of the target function 
(i.e. in the event that the target distribution is multi-modal in nature the MCMC chain 
runs the risk of getting trapped in local modes) and autocorrelation of samples from the 
Metropolis-Hasting's chain. Our scheme generally involves running a FFBS on the auxiliary 
sampler to generate several proposals at each iteration step. Let the proposal distribution 
be denoted by 

T-l 

qiil:T\e,yi:T) = q{eT\e,yi:T) [J QiClCt+l, , yi:t) , (13) 

where q{CtWt+i^^^ Vi'-t) °^ qiCtlvi-.t, d)q{xt+i\xt,0) with g(C^|yl:t, 0) representing filtered prob- 
ability. A discussion on the proposal distribution is presented in section 3.2. In the following, 
we discuss the three multipoint algorithms considered in this paper. 



3.1.1 Multiple- Try Metropolis Sampler 



Liu et al. 



3 



20001 suggest the Muhiple-Try Metropolis (MTM) sampler scheme. As in the 



general case of multipoint samplers, their idea is to consider several points generated by a 
proposal distribution so that possibly a larger region from which the new value for the chain 
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is chosen can be investigated. By using the muhiple-try strategy, it is easier for the iterates to 
jump from one local maximum to another and thus speed up the convergence to the desired 
target distribution. Samples from the proposal distribution will be generated by FFBS 
algorithm. We present below a sketch of the main ingredients ne eded in Forward Filter (FF) 
and Backward Sampling (BS) algorithm and refer the reader to 



Friihwirth-Schnattei 



2ooe 



for detailed presentation of this procedure. At time t, given and yi;t the FF probabilities 
are obtained by first computing the one-step ahead prediction 

M / AI \ 

then, the FF is 

q{itW,yi:t} - — — — (14) 

J2i=i9{ytWt = e-,6',yi:t-i)g(Ct = e[\e,yi.,t-i) 

where g{yt\£,t,0,yi;t~i) is the conditional density of the return process under the auxiliary 
model. Using the output of the FF, we compute q{£,T\(^,yi:T) and 

qi^t\^t+i,0,yi:t) = ^ — i. . , (15) 

for t = T — 1, T — 2, . . . , 2, 1. Then at each time step we sample from q{(,'rp\0, yi-.r) and 
Ct from 6*, yi.t) iteratively for t = T - 1, T - 2, . . . , 2, 1. This is the BS step. The 

BS procedure is implemented by first noting that ^t+i is the most recent value sampled for 
the hidden Markov chain at t + 1 and since can take one of ei, . . . , ejv/, we compute the 
expression in equation p5|l for each of these values. Then sampling from 9(CtlCt+ii 2/i:t) 
once the corresponding probabilities for ~ e[ for i = 1 , . . . , Af are known may be compared 
to sampling from a multinomial distribution. Note that at each iteration step of the MCMC 
procedure we only need a single run of the Forward Filter (FF) for generating generating 
multiple proposals using Backward Sampling (BS). 

A summary of our MTM algorithm is given in algorithm 1. 



Algorithm 1 MTM Sampler 

i. Choose a starting value ^i.rp. 

ii. Let Ci^T t)6 the value of the MTM at the (r — l)-th iteration. 
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iii. Construct a trial set {^i.t.i, ^1:T.2, ■ ■ ■ , ^1:T,k} containing K state variable paths drawn 
from the proposal distribution q{£,i:T\&''''~^\yi:T)- 



iv. Evaluate 



V. Select ^i-T from {^i:t.i,^1:T.2 ■ ■ ■ ,Ci:T,if } according to the probability 



Pk = 



Wk{^UT,k,&^) 



EtlWki^l:TM,^\^.T 



yk = i,...,K. 



vi. Construct a reference set {^jj'.j^ i, 21 ■ • ■ i t k} setting the first K — 1 elements 
to a new set of samples drawn from the proposal distribution q{^i:T\0^^^^\yi:T) and 
the K—th element Ci tat ^'^ ^i't^'- 



vii. Draw u ^ l^[o,i] 

viii. Set 



where, 



Ar) 
Sl:T 



Cl:T if U < a{il:T,^'lT^^) 

^i^T^'^ otherwise 



"(?i:T,4i:T ) = mm 



V ' Ek=iWk{e^..T.„ii:T) J 



Observe that the MTM algorithm reduces to standard Metropolis-Hasting algorithm 
when K = 1. We also note that alternative weight function other than the importance 
weight function assumed in the MTM algorithm presented above could be defined. 

3.1.2 Multiple-trial Metropolized Independent Sampler (MTMIS) 

As we are using independent proposal distributions in the MTM algorithm, the generation 
of the set of reference points is not needed to have a po ssibly more efficient generalized 



Liu 



200a | we combine the MTM with the 



MH algorithm. Thus, following the suggestion of |] 
metropolized indpendent sampler and obtain Algorithm 2. The main advantage is that one 
can use multiple proposals without generating the reference points, obtaining thus a decrease 
of the computational complexity of the algorithm. 
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Algorithm 2 Multiple-trial Metropolized independent Sampler (MTMIS) 



i. Choose a starting value £,i.rp. 

ii. Let ^^^'^ value of the MTM at the (r — l)-th iteration. 

iii. Construct a trial set {^i:t.17 ^1:T.2, ■ • ■ , Ci-.t.k} containing K state variable paths drawn 
from the proposal distribution. 

iv. Evaluate 

^fefe:T.fc) ^ ^f/'^-'\Tr~i)'^'''^l Vk^l,...,K, and define = ^ M^fc(a:T,fc) 

v. Select ^i-T from {^i:t.1i^1:T.2 ■ ■ ■ ,^1:T,k} according to the probability 

W^fc(Cl:T,fc) 

vi. Draw u ~ W[o,i]- 

vii. Set 



where, 



Ar) 
S1:T 



a(Ci:T,'?lyT^^) = mill I 1' 



^i^T otherwise 



W 



3.1.3 Multiple Correlated- Try Metropolis (MCTM) Sampler 

To further improve the efficiency the MTM algorithm and to ensure that a larger portion 
of the sample space is explored for better mixing and shorter running time, we propose 
the use of correlated proposals. There are various ways of introducing correlation among 
proposals e.g. antithetic and stratified approaches. In this paper, we study the antithetic 
approach. The use of antithet i c sam pling in a Gibbs sampling context allows for a gain of 
efficiency. 



Pitt and Shephard 



3 



1996l | propose a blocking method with antithetic approach 
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for non-Gaussian state space models, 



Holmes and Jasra 



3 



2009l | propose a scheme for re- 



ducing the variance of estimates fro m the standard Metropolis -within-Gibbs sampler by 



Biziaieva and Olsson 



2008l | propose a forward filter- 



introducing antithetic samples while 
ing backward smoo t hing particle filter algorithm with antithetic proposal. Here we follow 



Craiu and Lemieux 



2007l | which use antithetic proposals within a multi-point sampler and 
apply their idea to the context of discrete state space models. We propose a correlated 
proposal MTM sampler based on a combination of the FFBS and antithetic sampling tech- 
niques. To the best of our knowledge, antithetic proposals of this kind have not been used 
in the context of Markov switching nonlinear state space models. The idea is to choose, 
at each step of the MCMC algorithm, a new hidden state trajectory from ne gatively co rre- 



lated proposals instead of independent proposals. Following the suggestion of 
obtain Algorithm 3. 



Liu 



2002 



we 



Algorithm 3 Muhiple Correlated- Try Metropolis (MCTM) Sampler 

i. Choose a starting value ^it- 

ii. Let Ci^T be the value of the MTM at the (r — l)-th iteration. 

iii. Construct a trial set {(,i:t,i,^1:T.2, ■ ■ ■ Ai-.t.k} containing K correlated state variable 
paths drawn from the proposal distribution. 

iv. Evaluate 

p{Ci:T,i\0^'^-^\yv.T) 



qi^l:T,l\0^^~^\yi:T)' 



W tC \ W /C ^ P{^l:T,k-l\d^ \yi:T) w ; o 

9(Cl:T,fe-l fc'"^ ^^2/l:T) 



Select ^i-T from {^i:t.i, Ci:T.2 • ■ • ,£,1:T,k} according to the probability 
Pk = -(^ny-. Vfc = l,...,if. 

Efcl W^fe(6:T,l:fc,^l:T ) 



vi. Supposing ~ ^1:T,i is chosen in item (v) above, create a reference set 
{?i:T,i' ^i:T,2' • ■ • ' ^I.t^k} by letting 
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and drawing Q.t j for j — I + I, . . . , K from the proposal distribution, 
vii. Draw M Z//[o,i]- 



Set 



where, 



Sl:T 



«(fl:T,C[ 



T I 



(i-T^^ otherwise 



1 EtiWki^i-.TMk.^r^.r'^) 



The simplest way to introduce negative correlation between the trajectories generated 
with the FFBS algorithm is to use, at a given iteration r of the sampler and for the t-th 



hidden state, a set of K uniform rando m numbers [/^ 



(r) 



the pe rmuted displacement method (see 



t.k 1 



1, 



Arvidsen and Johnsson 



K ge nerated following 



198a | and 



Grain and Mens 



2005j) given in Algorithm 4. The uniform random numbers are then use within the BS 



procedure to generate correlated proposals. 



Algorithm 4 Permuted displacement method 

• Draw ri ^ Z^[o.i] 

• For fc = 2, . . . , A' — 1, set = + 1/2J where [xj denotes the fractional part of 



• Set rK = l- {2^-2^1} 

• Pick at random a G Sk, where Sk is the set of all possible permutation of the integers 
{!,..., A'} 



For fc = 1, . . . , AT, set [/fc = 



(fe) 



For K = 3, 



Craiu and Mens 



show that the random numbers generated with the 
permuted displacement method are pairwise negatively assoc i ated ( PNA). The definition of 



PNA given in the following is adopted from 



Craiu and Mens 
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Definition 3.1 (pairwise negative association). The random variables ^t,i,£,t,2-- ■ ■ ,£,t,K are 
said to be pairwise negatively associated (PNA) if, for any nondecreasing functions /i, /2 
and (i, j) G {1, . . . , K}"^ such that i ^ j 

Covih{^t,^)J■2{it,J))<0 

whenever this covariance is well defined. 

The proof for the case K > 4 is still an open issue. For this reason we consider in our 
algorithm K < 3. The presence of PNA in the case oi K > 4 proposals depends on the 
degrees of uniformity of the filtering probability and the gain of efficiency should be proved 
computationally in each applications. 

We use the permuted sampler to generate K = 2 multi-move and correlated proposals 
in the backward sampling step of the FFBS. In order to show how the antithetic sampler 
works, we consider the case where the hidden Markov switching process has two states, 
i.e. = (Cit,^2t)' and for notational convenience let {qf^''}t=i:T be the sequence of filtered 
probabilities of being in state 1 at the r-th iteration of the sampler, then we define the 
backward antithetic samples ^t,i and ^t,2 as follows 

(r) (r) 

where ~ 1 — . It is possible to show that 

Using the expected value of the square of the Euclidean distance, (i(^t,i, ^(,2), between this 
two antithetic samples to investigate the nature of the antithetic samples, extremely anti- 
thetic proposals is obtained when the distance on average is optimal. 

^^[^'(6,1,6,2)] = 2-2 ((2gW - 1)1^,.,^, + (1 - 2gM)I^(.,^,) . (16) 

. . ... . (r) 

From equation (jl6|) extreme antithetic is attained when ql is equal to 0.5, which can be 
easily found in applications where regimes exhibit similar persistence level.. 
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3.2 Auxiliary models for defining the proposal distribution 



In order to built proposal distributions for the state variables, we will exploit all the knowl- 
edge we have about the full conditional distribution. The first step is to approximate the 
MS-GARCH model by eliminating the problem of path dependence and then deriving a 
proposal distribution for state variables from the auxiliary model does obtained. A possible 
way of circumventing the path dependence problem inherent in the MS-GARCH model is 
to replace the lagged conditional variance appearing in the definition of the GARCH model 
with a proxy. A look into the literature shows different auxiliary models which differs only 
by the content of the information used in defining the proxy used in each case. In general, 
various MS-GARCH (as available in the literature) can be obtained by approximating the 
conditional variance 

<^tiyi.t-l,Sa{St)) = V[yt\yi:t-l,Sl:t] = V[et\yi:t-1 , Sl-.t] 

of the GARCH process as follow 



(17) 



In the subsection we present alternative specifications of e(x)t-i and cr'^x)t-i that define 
different approximations of the MS-GARCH model. The variable X can take on a ny of 
B, G, D, SK, K with e ach nota t ion re presenting, respectively, the Basic approximation. 



199a | approximation. 



approximation and 



Duekei 



Klaassen 



1997B approximation. Simplified version of 

"T 



Klaassen 



Gra 



200 



2002| approximation. 



3.2.1 Model 1 



As a first attempt at eliminating the path dependent problem, we note that the conditional 
density of et is a mixture of normal distribution with zero mean and time varying variance. 
Hence, we approximate the switching GARCH model by replacing the lagged conditional 
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variance, a^_i, with the variance c^^jj.^ of the conditional density of i.e. 

^{B)t-i = yt-1 — M(B)t-i 

M(B)t-l = E[flt-l{yi:t-2,£,t-ldtJ.)\yi:t~2] = E[yt_i\yi,t-2] 
M 

^lB)t~l = E[(j'^^_^{y 1:1-2, £.'t^iOa)\yi:t -2] = £^[et_i |?/l:t_2] = V {et-l\yi:t-2) 
M 

= ^t-iiyi:t-2,e'„,e^)q{Ct-i = e;„|yi:t_2). 

m—1 

Observe that in this approximation scheme M(_B)t-i and o''^g-ji_i are functions of yi:t_2 
and the information coming from yt-i is lost. With = e^|j/i:f_2) known for m ~ 

1, . . . , M, ijL(^B)t-i can easily be computed while can be computed recursively since 

o'?_i(2/i:t-2, e^0(j) depends on c^^jj_2. Note that in this approximation the conditioning is 
on yi:t-2- This approach represents a starting point for other approximations hence we tag 
it Basic Approximation. 



3.2.2 Model 2 



Grav 



3 



19961 notes that the conditional density of the return process, yt, of the switching 



GARCH model is a mixture of normal distribution with time-varying parameters. Hence, 
he suggests the use of the variance of the conditional density o'^Q^f_i of yt as a proxy for the 
lagged of the conditional variance Ct-i switching GARCH process i.e. 



e(G)t-i = yt-i - M(G)t-i 

Ai(G)t-l = Ai(B)t-l 

'^IgY-i = y (.yt-i\yi:t-2) = V {E[yt-i\yi:t-2,i't-i]\yi:t-2) + E[V {yt-i\yi:t-2, it-i) \yi-t 

= y(/it_l(yl:t_2,^^_l0^)|2;l:t-2) + S[at2^l(yl:t-2,^^_10a)|2;l:t-2] 

- E[{lit-i{yi:t-2,i't-iO^)f\yi.,t-2\ - {E[iH-i{yi:t-2,i't-iO^)\yi.,t-2\f + (^lB)t-l 
M 

= = e'Jyi:t-2) - i^^{B)t-i)^ + crfB)t-i- 



Similarly, as in model 1, information on yt-i is lost in this approximation scheme as 
and (^'^Q'ft^i arc functions of yi:t-2- By recursion, ct^q^^.x can be computed since c^^^^.x 
depends on cr'^Q-^t_2 through (T^_x(?/i:t_2, ej^^^). Within this framework the conditioning is 
also on yi:t-2- The major difference between Model 1 and 2 can be seen from the development 
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of the proxy i.e V{et-i\yi:t-2) is replaced with V{yt-i\yi:t-2) in model 2. 



3.2.3 Model 3 



In the previo us approximation schemes, the information coming from yt-i is not used. 



Duekei 



1997| suggests that yt^i should be included in the conditioning set of the proxy 
while assuming that ^t-i and cr^_i are functions of {yi-t-^2i Ct-2)- The following relation can 
thus be credited to him 



e(D)t-i = yt-i - Ai(D)t-i 
M(D)t-i = E[fXt-i{yi.,t-2,Ct-2&t,)\yi:t-i] = Mt-i(yi:t-2,e;,6l^)g($^„2 = e'Jyi-.t-i) 



M 



m—1 

M 



'^{D)t-i = E[a^_j^{yi..t-2,Ct-2^a)\yi:t-i] = '^t-iiyi-t-2,e'^9a)q{^t_2 = e'Jyi..t-i). 



m—l 



The probability = £m\yi--t) is a one period ahead smoothed probability which can be 

computed as: 



M 



i=l 
M 

i=l 
M 



M 

E 



M 



= (^L\yi:t-i)Yl 



g(Ct = e»l^^-i = e'^,yi:t-i)q{^'i ^ e^l^i:*) 
'Z(C^ = e:;|2/l:t-l) 



Within this framework we note that the conditioning is on yi-.t-i while the functional form 
depends on (yi:t_2, ^^_2)- We equally note that at every time step t the value of q{Ct-2 ~ 
s'm\yi:t-i) for all m is required for computation. 



3.2.4 Model 4 

The following approximation is similar to model 3. As opposed to model 3, we assume 
that fit-i and (J^_^ are functions of {yi:t-2,Ct-i)- This modification leads to the following 
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approximation iKlaassenl 2002| model 



e(Sif)t-i = yt-i — fJ'{SK)t~i 

M(Sif)t-l = E[nt-l{yi:t-2,^t-lSt^)\yi:t^l] = ^*t"-l(yl:t-2,e^„6'A')'^(^^-l = e'^\yi:t-l) 



m— 1 

M 



In the next approximation, the current regime will be added to the conditioning set of this 
version of the auxiliary mo del. Hence, this approximation will be identified as the simplified 



version of 



Klaassen 



2002 



model. In order to implement this approximation scheme the 



value of q{^'^_^ ~ e^j|?/i:t_i) for all m is required at each point in time t. 



3.2.5 Model 5 



In each of the approximations described above, info rmation relating to the current regime 



is ignored in the conditioning set. On observing this, 
approximation 

^{K)t-i = yt-1 - tJ-i,(K)t-i 
M»,(if)t-i = E[fit-i{yi:t-2, (,'t-idt,)\yi:t-iAt = e^] 

M 



Klaassen 



2002l | suggests the following 



'i,{K)t-l 



Y ^^t-l{yl■.t-2,e'^e^)q{^'^_^ = e',Jyi..t-i,^'t = e^) 

m—1 

E[al,{y,..t-2,et-iOa)\yi:t-i] 

Y (Mt-i(yi:t-2,e^j6l^) + crt2_i(2;i:t-2,e'„6l^)) ^ e'„,\yi..t-i,£.'t = e-) 

m—1 

^^t-liyl■.t-2,e'^0f,)q{^t_l = e'^\yi.,t-i,^i ^ e-)^ , 



where 



= e'Jyi:t-i,^'t = e'i) 



^ g(Ct = e-|yi:t-l,Ct-l = KnM^'t-l ^ e'r,i\yi:t-l) 

g(e^ = e^|yi:t-i) 

Note that this approximation requires the computation of q{^'^_i — e^|?/i:t_i, = e'^) for 
all m and i at time t. 
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3.3 Sampling the 6 



Sampling 9 from the full conditional distribution will be done by separating the parameters 
of the transition matrix from the GARCH parameters. We assume that the parameters of 
the transition probabilities are independent of GARCH parameters. 



3.3.1 Sampling transition probability parameters 

The posterior distribution of 9,^ is given by 

f{O^\^l■.T,0^.,0a,yl■.T) cc f{^i:T,e^,9,,y^.,T%)f{9^) 

^f{^i:T,vi:T\e)f{e.) 

T / M \ 

t=2 \i=l / 
T I M / M ^ 

t=2 \ i=l \j=l 



(18) 



M M 



j=ii=i 



where riij is the number of times = S.jt-i = 1 for i, j = 1, . . . , M. It is easy to show that by 
substituting, as defined earlier, the conjugate Dirichlet prior for the transition probabilities, 
9Tr, in ([T^ we obtain 



M 



f{0-^\£,i:T,df_„9^,yi.,T)^ Y[ Dirichlet (ni„i + 77i„, ... ,nMrn + ?7A/rn)- (19) 

m— 1 

3.3.2 Sampling GARCH parameters 

Given a prior density f{9^,9c), the posterior density of (9^,9^) can be expressed as 

T 

/(^M.^-l6:T,e.,2/l:T)«/(ep,0.)^■^(/^*(yl:*-l'^^^M)><^?(2/l:*-l>^^^-)) (20) 

t=l 

For this step of the Gibbs sampler, we apply adaptive Metropolis-Hastings (MH) sampling 
technique since the full conditional distribution is known to be non-standard. Details can 
be found, as required, in the appendix. 
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4 Illustration with simulated data set 



We generated a time series of length 1500 from the data generating process corresponding to 
the model defined by equations Q and Q for two regimes {M ~ 2), time invariant transition 
probabilities and constant parameter switching conditional mean. The parameter values for 
the simulation exercise are set at: fi = {fii,^2) ~ (0.06,-0.09), 7 = (71,72) = (0.30,2.00), 
a = (ai,a2) = (0.35,0.10), /3 = (/3i,/32) = (0.20 0.60), tth = 0.98, 7r 22 = 0.96. These 



Bauwens et al 



3i 



2OIOII in a similar Monte 



parameter values corresponds to the choices made byU 
Carlo exercise. A relatively higher and more persistent conditional variance as compared to 
the first GARCH equation is implied by the second GARCH equation. Also, the transition 
probabilities of remaining in each regime is close to one. A summary statistics of a typical 
series of length 1500 simulated from this DGP is reported in Table [1] , and in Figure [T] 
we display, respectively, the time series, kernel density estimate and the autocorrelation 
function (ACF) of the square of the same scries. The mean of the series is close to zero 
and the excess kurtosis is estimated to be 3.57. For each hidden state sampling algorithm 

Table 1: Descriptive statistics for simulated data. 



Min. max. 


Mean 


Std. 


Skcwncss 


Kurtosis 


-6.9540 10.7600 


-0.0042 


1.5740 


0.04120 


6.5659. 



described in Section 3.1 and the auxiliary models presented in Section 3.2, we perform 10000 
Gibbs iterations and compare estimates from these schemes with estimates obtained using 
the single-move sampling scheme for the hidden states. To carry out the MCMC exercise, we 
set the initial parameters of the algorithm to the maximum likelihood estimates of one of the 
MS-GARCH approximations described in Section 3.1 and randomly generated initial state 
trajectory. The hyperparameters of the prior distributions of the transition probabilities 
Vij for i,j = 1,2 are set to 1 while the support for other parameters are given in the table 
reporting their parameter estimates. The case of two trials, {K = 2), is considered within the 
different multi-point sampling strategies discussed earlier. Table from [2] to [6] highlights the 
posterior means and standard deviation of the parameters and the transition probabilities 
of the MS-GARCH under each of the auxiliary models used in constructing proposals for 
the hidden states. Column 4 of each of these tables reports the parameter estimates and 
transition probabilities obtained by using the single move technique for sampling the state 
variables within the Gibbs algorithm while in columns 5 to 7 we present the result obtained 
using the different multi-move multipoint sampling techniques within the Gibbs algorithm. 

With the exception of a few, the posterior means under the multi-move multi- 
point sampling schemes relative to the single-move technique have more values within one 
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Tabic 2: Estimated parameter value and posterior statistics using Model 1. 





DGP Values 


Prior supports 


Single Move 


Multi move 


MTM 


MTMIS 


MCTM 




0.980 


(0.00 1.00) 


0.968 


0.972 


0.974 


0.977 








(0.014) 


(0.005) 


(0.006) 


(0.005) 




0.960 


(0.00 1.00) 


0.995 


0.952 


0.955 


0.957 








(0.002) 


(0.011) 


(0.011) 


(0.009) 


Ml 


0.060 


(0.02 0.15) 


0.099 


0.045 


0.049 


0.046 








(0.031) 


(0.017) 


(0.019) 


(0.0173) 




-0.090 


(-0.35 0.18) 


-0.013 


-0.109 


-0.107 


-0.110 








(0.035) 


(0.106) 


(0.108) 


(0.107) 


71 


0.300 


(0.15 0.45) 


0.290 


0.345 


0.365 


0.350 








(0.053) 


(0.046) 


(0.046) 


(0.047) 


72 


2.000 


(0.50 4.00) 


0.508 


1.682 


2.042 


2.533 








(0.010) 


(0.432) 


(0.599) 


(0.650) 


ai 


0.350 


(0.10 0.50) 


0.227 


0.141 


0.181 


0.180 








(0.099) 


(0.037) 


(0.049) 


(0.044) 


Oil 


0.100 


(0.02 0.35) 


0.331 


0.042 


0.047 


0.047 








(0.016) 


(0.019) 


(0.023) 


(0.024) 


/3i 


0.200 


(0.05 0.40) 


0.190 


0.248 


0.196 


0.227 








(0.097) 


(0.082) 


(0.076) 


(0.079) 




0.600 


(0.35 0.85) 


0.510 


0.683 


0.612 


0.534 








(0.019) 


(0.084) 


(0.109) 


(0.111) 



Table 3: Estimated parameter value and posterior statistics using Model 2. 





DGP Values 


Prior supports 


Single Move 


Multi move 


MTM 


MTMIS 


MCTM 


TTll 


0.980 


(0.00 1.00) 


0.968 


0.973 


0.9753 


0.9771 








(0.014) 


(0.006) 


(0.006) 


(0.006) 


7r22 


0.960 


(0.00 1.00) 


0.995 


0.952 


0.952 


0.957 








(0.002) 


(0.011) 


(0.011) 


(0.010) 


Ml 


0.060 


(0.02 0.15) 


0.099 


0.045 


0.047 


0.048 








(0.031) 


(0.017) 


(0.018) 


(0.018) 


M2 


-0.090 


(-0.35 0.18) 


-0.013 


-0.108 


-0.111 


-0.120 








(0.035) 


(0.107) 


(0.111) 


(0.109) 


71 


0.300 


(0.15 0.45) 


0.290 


0.344 


0.328 


0.347 








(0.052) 


(0.046) 


(0.052) 


(0.052) 


72 


2.000 


(0.50 4.00) 


0.508 


1.701 


1.923 


1.968 








(0.009) 


(0.442) 


(0.626) 


(0.673) 




0.350 


(0.10 0.50) 


0.228 


0.142 


0.181 


0.186 








(0.098) 


(0.039) 


(0.042) 


(0.044) 




0.100 


(0.02 0.35) 


0.331 


0.042 


0.043 


0.044 








(0.016) 


(0.019) 


(0.021) 


(0.022) 




0.200 


(0.05 0.40) 


0.190 


0.250 


0.275 


0.237 








(0.096) 


(0.079) 


(0.084) 


(0.086) 




0.600 


(0.35 0.85) 


0.511 


0.681 


0.645 


0.631 








(0.019) 


(0.085) 


(0.117) 


(0.1216) 



posterior standard deviation away from the DGP values. In Figure from [2] to [5] we report 
the posterior densities of the parameters using single-move, MTM, MTMIS, and MTCM 
sampling strategies respectively. The multi- move sampler are constructed using model 5. 
The shapes of the posterior densities are unimodal, thus ruling out label switching problem. 
We also examine the performance of our multi-move multipoint algorithms relative to the 
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Tabic 4: Estimated parameter value and posterior statistics using Model 3. 





DGP Values 


Prior supports 


Single Move 


Multi move 


MTM 


MTMIS 


MCTM 


TTll 


0.980 


(0.00 1.00) 


0.968 


0.975 


0.976 


0.977 








(0.014) 


(0.005) 


(0.006) 


(0.006) 


71'22 


0.960 


(0.00 1.00) 


0.995 


0.956 


0.956 


0.956 








(0.002) 


(0.009) 


(0.011) 


(0.011) 


Ml 


0.060 


(0.02 0.15) 


0.099 


0.050 


0.050 


0.049 








(0.031) 


(0.018) 


(0.019) 


(0.018) 




-0.090 


(-0.35 0.18) 


-0.013 


-0.128 


-0.122 


-0.116 








(0.034) 


(0.104) 


(0.106) 


(0.108) 


71 


0.300 


(0.15 0.45) 


0.290 


0.382 


0.371 


0.354 








(0.052) 


(0.043) 


(0.046) 


(0.051) 


72 


2.000 


(0.50 4.00) 


0.508 


2.107 


2.059 


2.448 








(0.009) 


(0.641) 


(0.648) 


(0.712) 


"1 


0.350 


(0.10 0.50) 


0.227 


0.168 


0.174 


0.167 








(0.098) 


(0.042) 


(0.047) 


(0.047) 


Q!2 


0.100 


(0.02 0.35) 


0.331 


0.046 


0.046 


0.048 








(0.016) 


(0.023) 


(0.022) 


(0.025) 


/3i 


0.200 


(0.05 0.40) 


0.190 


0.173 


0.199 


0.237 








(0.096) 


(0.076) 


(0.081) 


(0.089) 




0.600 


(0.35 0.85) 


0.510 


0.603 


0.613 


0.547 








(0.019) 


(0.114) 


(0.117) 


(0.119) 



Table 5: Estimated parameter value and posterior statistics using Model 4. 





DGP Values 


Prior supports 


Single Move 


Multi move 


MTM 


MTMIS 


MCTM 


TTll 


0.980 


(0.00 1.00) 


0.968 


0.978 


0.977 


0.977 








(0.014) 


(0.005) 


(0.006) 


(0.005) 


7r22 


0.960 


(0.00 1.00) 


0.995 


0.959 


0.958 


0.957 








(0.002) 


(0.010) 


(0.010) 


(0.011) 


Ml 


0.060 


(0.02 0.15) 


0.099 


0.049 


0.048 


0.050 








(0.031) 


(0.019) 


(0.018) 


(0.019) 


M2 


-0.090 


(-0.35 0.18) 


-0.013 


-0.121 


-0.117 


-0.134 








(0.034) 


(0.109) 


(0.108) 


(0.108) 


71 


0.300 


(0.15 0.45) 


0.290 


0.362 


0.366 


0.370 








(0.052) 


(0.045) 


(0.046) 


(0.0469) 


72 


2.000 


(0.50 4.00) 


0.508 


2.519 


1.931 


2.173 








(0.009) 


(0.683) 


(0.648) 


(0.665) 




0.350 


(0.10 0.50) 


0.227 


0.170 


0.179 


0.172 








(0.098) 


(0.041) 


(0.050) 


(0.044) 




0.100 


(0.02 0.35) 


0.331 


0.046 


0.046 


0.046 








(0.016) 


(0.023) 


(0.022) 


(0.023) 




0.200 


(0.05 0.40) 


0.190 


0.230 


0.204 


0.205 








(0.096) 


(0.082) 


(0.077) 


(0.082) 




0.600 


(0.35 0.85) 


0.510 


0.539 


0.633 


0.594 








(0.019) 


(0.113) 


(0.116) 


0.1157 



single-move strategy by computing the percentage of correctly specified regimes. To do this, 
we first calculate the average of the Gibbs output on the state variables and then assign 
mean states greater than one-half to regime 2 (and regime 1 otherwise). We find out that 
the single-move technique is able to classify 43% of the data correctly while the multi-move 
multipoint samplers classified between 93% and 96% of the data correctly. The acceptance 
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Tabic 6: Estimated parameter value and posterior statistics using Model 5. 





DGP Values 


Prior supports 


Single Move 


Multi move 


MTM 


MTMIS 


MCTM 


TTll 


0.980 


(0.00 1.00) 


0.968 


0.974 


0.976 


0.976 








(0.015) 


(0.006) 


(0.006) 


(0.006) 


71'22 


0.960 


(0.00 1.00) 


0.995 


0.954 


0.957 


0.957 








(0.002) 


(0.012) 


(0.011) 


(0.011) 


Ml 


0.060 


(0.02 0.15) 


0.099 


0.050 


0.049 


0.050 








(0.031) 


(0.019) 


(0.018) 


(0.019) 




-0.090 


(-0.35 0.18) 


-0.013 


-0.127 


-0.124 


-0.123 








(0.035) 


(0.107) 


(0.108) 


(0.105) 


71 


0.300 


(0.15 0.45) 


0.290 


0.368 


0.373 


0.378 








(0.053) 


(0.045) 


(0.046) 


(0.045) 


72 


2.000 


(0.50 4.00) 


0.508 


1.869 


1.864 


2.069 








(0.010) 


(0.694) 


(0.679) 


(0.629) 


cti 


0.350 


(0.10 0.50) 


0.227 


0.172 


0.171 


0.177 








(0.098) 


(0.044) 


(0.044) 


(0.046) 


Q!2 


0.100 


(0.02 0.35) 


0.331 


0.045 


0.045 


0.047 








(0.016) 


(0.022) 


(0.022) 


(0.024) 


/3i 


0.200 


(0.05 0.40) 


0.190 


0.200 


0.194 


0.183 








(0.096) 


(0.079) 


(0.079) 


(0.079) 




0.600 


(0.35 0.85) 


0.510 


0.648 


0.648 


0.608 








(0.019) 


(0.126) 


(0.123) 


(0.116) 



rate of the the multi-move multipoint proposals varies between 1% and 20% with the highest 
arising from multipoint sampling schemes proposal distribution constructed using model 5. 
We compute the mean squared error (MSE) of the posterior means of parameter relative to 
the true parameter to further quantify our estimators i.e. 

ri 

MSE = -Y^{e^-e,f (21) 

i=l 

where n is the number of parameters, di is the parameter estimate of the i-th element, 0^, 
of the DGP parameter set. The result of this exercise is reported in Table [71 From Table [71 
the low MSE of our multipoint sampling schemes further confirms their superiority over the 
single-move procedures. The inefficiency of the various multi-move multiple-try Metropolis 



Table 7: Mean Squared Error (MSE). 





Single move 


MTM 


MTMIS 


MCTM 


Model 1 


0.2310 


0.0160 


0.0038 


0.0324 


Model 2 


0.2310 


0.0147 


0.0047 


0.0036 


Model 3 


0.2310 


0.0056 


0.0044 


0.0245 


Model 4 


0.2310 


0.0315 


0.0043 


0.0071 


Model 5 


0.2310 


0.0060 


0.0062 


0.0045 



samplers relative to the single-move sampler are further assessed by examining how much the 
variance of the parameters are increased due to autocorrelation coming from the sampler. 
Let z^^\ . . . , z^*^' denote a sample from the posterior distribution of a random variable Z . 
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Figure 2: Posterior densities of the MS-GARCH parameters using single-move Scheme 
Then inefficiency factor (IF) is evaluated by 



L 

IF =\ + 2Y,wiPi (22) 

1=1 

where p;, I ~ 1,2,... is the autocorrelation function of z'^^\ . . . , z'^'^^ at lag I and wi is 
the associated weight. If the samples are independent then IF = 1. li A and B are two 
competing algorithm with inefficient factor IFa and IFb respectively then we define the 
relative inefficiency [RI] as: 

_ TimeA ^ IFa 
TimeB IFb 

where TimeA and TimeB corresponds to the computing times of each algorithm. Rl mea- 
sures the factor by which the run-time of algorithm A must be increased to achieve algorithm 
B's precision; values greater than one suggests that algorithm B is more efficient. We pro- 
vide in Table from to [T^] the Rl for various multi-move multipoint algorithms relative to the 
single-move sampling strategy. The number of lags over which we calculate the RI is fixed 
a,t L — 500. From these tables our multi-move multipoint algorithms arc more efficient than 
the single-move sampling technique for the state variable. This is despite the low accep- 
tance rate of the of the m ultipoint proposals. Finally we shall notice that, as discussed in 



Craiu and Lemieux 



2007l | , a larger number of proposals is required to observe an appreciable 
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Figure 3: Posterior densities of the MS-GARCH parameters using MTM with model 5 
difference in the efficiency of the MCTM over the standard MTM. 

Table 8: Relative inefficiency factor using Model 1 





MTM 


MTMIS 


MCTM 


maxt=i:T(CTt ) 


68.16 


95.79 


92.48 




64.31 


60.37 


139.11 


7r22 


53.93 


65.52 


115.91 


Ml 


119.42 


105.59 


153.58 




78.08 


62.13 


107.04 


71 


45.96 


77.43 


66.18 


72 


14.69 


17.57 


15.29 


ai 


77.54 


136.39 


206.11 


a2 


42.54 


64.04 


71.15 


/3i 


44.76 


89.79 


76.29 


132 


26.05 


32.98 


29.96 
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Figure 4: Posterior densities of the MS-GARCH parameters using MTMIS with model 5 



Table 9: Relative inefficiency factor using Model 2 





MTM 


MTMIS 


MCTM 




72.08 


93.97 


95.35 


TTll 


54.26 


71.36 


82.63 




53.43 


60.85 


86.16 


Ml 


125.27 


124.69 


156.10 


M2 


81.05 


78.37 


66.96 


71 


50.08 


53.53 


55.99 


72 


15.11 


16.20 


14.21 


ai 


76.74 


238.36 


202.02 


a2 


45.30 


58.35 


60.34 


Pi 


49.03 


62.00 


63.08 


P2 


26.94 


28.97 


26.60 



Table 10: Relative inefficiency factor using Model 3 





MTM 


MTMIS 


MCTM 


maxt^i.T(CTt ) 


66.64 


94.80 


90.29 




55.04 


51.68 


58.42 


7r22 


63.59 


62.76 


49.31 


Ml 


96.03 


107.90 


147.03 


M2 


50.53 


71.94 


84.67 


71 


49.08 


72.63 


55.65 


72 


10.64 


15.14 


13.64 


ai 


129.17 


142.76 


114.75 


0-2 


39.85 


60.02 


61.12 


Pi 


50.69 


75.29 


59.40 


h 


19.97 


28.55 


26.43 
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Figure 5: Posterior densities of the MS-GARCH parameters using MCTM with model 5 
Table 11: Relative inefficiency factor using Model 4 





MTM 


MTMIS 


MCTM 


maxt^i.T(crt ) 


74.01 


96.79 


94.01 


TTll 


44.37 


62.01 


77.53 


7r22 


68.24 


76.50 


59.64 


Ml 


97.07 


156.67 


142.73 


M2 


60.36 


71.65 


50.81 


71 


58.35 


75.87 


65.73 


72 


11.15 


15.45 


15.35 


ai 


174.85 


129.64 


180.54 




50.28 


59.96 


63.24 




53.23 


83.88 


68.51 


Pi 


22.25 


28.95 


29.81 



Table 12: Relative inefficiency factor using Model 5 





MTM 


MTMIS 


MCTM 


maxt^i.T(CTt^) 


69.05 


92.88 


114.51 




41.02 


71.41 


64.78 


7r22 


47.10 


73.47 


69.97 


Ml 


96.93 


135.98 


157.25 


M2 


46.60 


67.22 


81.80 


71 


55.87 


75.55 


80.21 


72 


9.39 


14.55 


16.68 




125.95 


185.58 


179.61 




41.49 


57.63 


56.37 


Pi 


57.95 


83.33 


85.43 


P2 


17.35 


26.76 


30.32 
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5 Conclusion 



In this paper we deal with the chahenging issue of efficient samphng algorithm for Bayesian 
inference on Markov-switching GARCH models. We provide some new algorithms based on 
the combination of multi-move and multi-points strategies. 



More specifically, we apply the multiple-try sampler of iCraiu and Lemieuxl [LjUUYlj com 
bined with multi-move Gibbs sampler to Markov-switching GARCH models. For generating 
correlated proposal, we introduce antithetic Forward Filtering Backward Sampling (FFBS) 



algori thm for MS-GARCH based on the permuted displacement method of 



of 



Grain and Mens 



2005 1 - Our algorithms also extend to Markov-switching state space models the algorithms 



So 



2006l | for continuous state space models. 



From the results of our computational exercise, we observed a substantial gain in the 
efficiency of our Gibbs samplers over the usual single-move sampling algorithm for estimating 
the parameters of the MS-GARGH model. We also observed low acceptance rate — 20%) 
for the multipoint proposals. Despite the low acceptance rate for the multipoint proposals, 
we still have good results considering the l ength of the time series (1500) used. We expect 



that using the blocking scheme (as in 



So 



2003) the efficiency and the acceptance rate of 
can our sampling procedure may increase. The issues of the choice of block length and of 
the application of the inference procedure to real data could be a matter of future research. 
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Appendix 



Constructing proposal distribution for O^^Oa 

Sample 9^f^\9^j'^ from /(6'^, Oa\£}i)p, t^'^^^Ui-.t)- Given a prior density 9a), the posterior 
density of 09 ^^e^ can be expressed as follows 

T 

t=i 

where, 



In order to generate 9^,9u from the j oint distribution we apply a further blocking of 



the Gibbs sampler. First, in the spirits of 



Friihwirth-Schnatter 



2006 we consider the full 



conditional distributions of the regime-specific parameters, and secondly, we split the regime- 
dependent parameters in two subvectors, the parameter of the observation equation and 
the parameters of the volatility process. As regards the parameters of the return process 
equation. 



where ^i-k = (/xi, . . . , /x^-i, ^ifc+i, • • • , A^a/)', Tk ^ {t ^ I, . . . ,T\(}^'' 



.k.t - 1}, V = {t ^ 

1, . . . , T\^l^l = 0, Ci'^t-i = !}• It is not possible to simulate exactly from the full conditional 
distribution of fi^, k = 1, . . . , A/ given the other parameters and the allocation variables, 
thus we apply a MTM step with independent normal proposal distribution. Focusing on the 
first term of the full conditional 

tS/fc VI \ talk tS/fe t6/fc / ■> 

and if an approximation (tJ"^ of ctj is available, then it is possible to approximate this part 
of the full conditional with a normal distribution with mean and variance 
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respectively, where 



with = (/i^, . . . , /iJI/), /i* = StgT] ^''^'i ~ SteT} ^i.*- This normal can be used 
as proposal in the MH step. 

As regards the parameters of the volatility process the full conditional is 



(25) 



where 7_fc ^ (71, . . . , 7fc_i, 7^+1, . . . , 7m), /3-fc (/3i, . . . , . . . , /3m) and a^k = 

(ai, . . . , ak-i, ckfe+i, • ■ • , Q;m)- We now follow the ARMA approximation of regime specific 
GARCH process i.e. 



Let 



with 



£^t-iK] = 0; and Vart-i[wt] = 2af 



Subject to the above and following 



Nakatsuma 



1998l | suggestion, we assume that Wf 



J\f{0, "icrf). Then we have an "auxiliary" ARM A model for the squared error . 



i.e. w; =e\- - (C;a)e?_i " (^^/3)(e?-l " 



Following 



Ardia 



(26) 



2008l | we further express w\ as a linear function of (3 x 1) vector of 



(7fc, Qffe, /3fc)'. To do this, we approximate the function w\ by first order Taylor's expan- 
sion about (7^'~"'"-', Q;[r~^\ /S^.''""'"'')'. 



« wT = - ((7.,«.,/3,) - {^t-"\^t"\fit-H^t 
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where 



dwl 
dak 

^""^ -~^tk{eU~wU) + m 



-^tk^t-\ 



dak 



d/3k 

dwt dwt dwf 



dPk 



djk ' dak ' dpk J ' (7.=7r '."^="L ^/3'==/3^ ') 



Upon defining 



' "fc ' l^k 



Pk j'^t, it turns out tlrat 



w = {wl* 



(7, a, /3)Vt. Furthermore, by defining the T x 1 vectors 

. . , wj*)', r* = (r^, . . . , r^y and V = (Vi, . . . , Vt)' as well as a T x T matrix 



V = 2 



**4 







**4 , 

"t I 



With = (^^^'7'^-^^) + - d:^^-^)^ + {^f ii^^-^^)oi% 

we can approximate the full conditional probability of the regime specific volatility param- 
eters as 



/(7fc,^fc,afc|^^yT,7-fe,^-fc,a-fc,/i^''^^/l:T) oc -f^r exp 



(r) 



1 



w'V-^w 

7fc>0,afc>0,^ifc>0 



(27) 



where 



To sample for the truncated multivariate N ormal distribut ion given in equation (j27p . we 
implement the Gibbs sampling technique by 



Wilhelm 



2012| for sampling from a truncated 



multivariate Normal distribution. 
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